68 research outputs found

    Querying websites using compact skeletons

    Get PDF
    AbstractSeveral commercial applications, such as online comparison shopping and process automation, require integrating information that is scattered across multiple websites or XML documents. Much research has been devoted to this problem, resulting in several research prototypes and commercial implementations. Such systems rely on wrappers that provide relational or other structured interfaces to websites. Traditionally, wrappers have been constructed by hand on a per-website basis, constraining the scalability of the system. We introduce a website structure inference mechanism called compact skeletons that is a step in the direction of automated wrapper generation. Compact skeletons provide a transformation from websites or other hierarchical data, such as XML documents, to relational tables. We study several classes of compact skeletons and provide polynomial-time algorithms and heuristics for automated construction of compact skeletons from websites. Experimental results show that our heuristics work well in practice. We also argue that compact skeletons are a natural extension of commercially deployed techniques for wrapper construction

    SEISMIC: A Self-Exciting Point Process Model for Predicting Tweet Popularity

    Full text link
    Social networking websites allow users to create and share content. Big information cascades of post resharing can form as users of these sites reshare others' posts with their friends and followers. One of the central challenges in understanding such cascading behaviors is in forecasting information outbreaks, where a single post becomes widely popular by being reshared by many users. In this paper, we focus on predicting the final number of reshares of a given post. We build on the theory of self-exciting point processes to develop a statistical model that allows us to make accurate predictions. Our model requires no training or expensive feature engineering. It results in a simple and efficiently computable formula that allows us to answer questions, in real-time, such as: Given a post's resharing history so far, what is our current estimate of its final number of reshares? Is the post resharing cascade past the initial stage of explosive growth? And, which posts will be the most reshared in the future? We validate our model using one month of complete Twitter data and demonstrate a strong improvement in predictive accuracy over existing approaches. Our model gives only 15% relative error in predicting final size of an average information cascade after observing it for just one hour.Comment: 10 pages, published in KDD 201

    Muppet: MapReduce-Style Processing of Fast Data

    Full text link
    MapReduce has emerged as a popular method to process big data. In the past few years, however, not just big data, but fast data has also exploded in volume and availability. Examples of such data include sensor data streams, the Twitter Firehose, and Facebook updates. Numerous applications must process fast data. Can we provide a MapReduce-style framework so that developers can quickly write such applications and execute them over a cluster of machines, to achieve low latency and high scalability? In this paper we report on our investigation of this question, as carried out at Kosmix and WalmartLabs. We describe MapUpdate, a framework like MapReduce, but specifically developed for fast data. We describe Muppet, our implementation of MapUpdate. Throughout the description we highlight the key challenges, argue why MapReduce is not well suited to address them, and briefly describe our current solutions. Finally, we describe our experience and lessons learned with Muppet, which has been used extensively at Kosmix and WalmartLabs to power a broad range of applications in social media and e-commerce.Comment: VLDB201

    To what extent could performance-based schemes help increase the effectiveness of prevention of mother-to-child transmission of HIV (PMTCT) programs in resource-limited settings? a summary of the published evidence

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In resource-limited settings, HIV/AIDS remains a serious threat to the social and physical well-being of women of childbearing age, pregnant women, mothers and infants.</p> <p>Discussion</p> <p>In sub-Saharan African countries with high prevalence rates, pediatric HIV/AIDS acquired through mother-to-child transmission (MTCT) can in largely be prevented by using well-established biomedical interventions. Logistical and socio-cultural barriers continue, however, to undermine the successful prevention of MTCT (PMTCT). In this paper, we review reports on maternal, neonatal and child health, as well as HIV care and treatment services that look at program incentives.</p> <p>Summary</p> <p>These studies suggest that comprehensive PMTCT strategies aiming to maximize health-worker motivation in developing countries must involve a mix of both financial and non-financial incentives. The establishment of robust ethical and regulatory standards in public-sector HIV care centers could reduce barriers to PMTCT service provision in sub-Saharan Africa and help them in achieving universal PMTCT targets.</p

    Guidelines for the use and interpretation of assays for monitoring autophagy (3rd edition)

    Get PDF
    In 2008 we published the first set of guidelines for standardizing research in autophagy. Since then, research on this topic has continued to accelerate, and many new scientists have entered the field. Our knowledge base and relevant new technologies have also been expanding. Accordingly, it is important to update these guidelines for monitoring autophagy in different organisms. Various reviews have described the range of assays that have been used for this purpose. Nevertheless, there continues to be confusion regarding acceptable methods to measure autophagy, especially in multicellular eukaryotes. For example, a key point that needs to be emphasized is that there is a difference between measurements that monitor the numbers or volume of autophagic elements (e.g., autophagosomes or autolysosomes) at any stage of the autophagic process versus those that measure fl ux through the autophagy pathway (i.e., the complete process including the amount and rate of cargo sequestered and degraded). In particular, a block in macroautophagy that results in autophagosome accumulation must be differentiated from stimuli that increase autophagic activity, defi ned as increased autophagy induction coupled with increased delivery to, and degradation within, lysosomes (inmost higher eukaryotes and some protists such as Dictyostelium ) or the vacuole (in plants and fungi). In other words, it is especially important that investigators new to the fi eld understand that the appearance of more autophagosomes does not necessarily equate with more autophagy. In fact, in many cases, autophagosomes accumulate because of a block in trafficking to lysosomes without a concomitant change in autophagosome biogenesis, whereas an increase in autolysosomes may reflect a reduction in degradative activity. It is worth emphasizing here that lysosomal digestion is a stage of autophagy and evaluating its competence is a crucial part of the evaluation of autophagic flux, or complete autophagy. Here, we present a set of guidelines for the selection and interpretation of methods for use by investigators who aim to examine macroautophagy and related processes, as well as for reviewers who need to provide realistic and reasonable critiques of papers that are focused on these processes. These guidelines are not meant to be a formulaic set of rules, because the appropriate assays depend in part on the question being asked and the system being used. In addition, we emphasize that no individual assay is guaranteed to be the most appropriate one in every situation, and we strongly recommend the use of multiple assays to monitor autophagy. Along these lines, because of the potential for pleiotropic effects due to blocking autophagy through genetic manipulation it is imperative to delete or knock down more than one autophagy-related gene. In addition, some individual Atg proteins, or groups of proteins, are involved in other cellular pathways so not all Atg proteins can be used as a specific marker for an autophagic process. In these guidelines, we consider these various methods of assessing autophagy and what information can, or cannot, be obtained from them. Finally, by discussing the merits and limits of particular autophagy assays, we hope to encourage technical innovation in the field

    Nations within a nation: variations in epidemiological transition across the states of India, 1990–2016 in the Global Burden of Disease Study

    Get PDF
    18% of the world's population lives in India, and many states of India have populations similar to those of large countries. Action to effectively improve population health in India requires availability of reliable and comprehensive state-level estimates of disease burden and risk factors over time. Such comprehensive estimates have not been available so far for all major diseases and risk factors. Thus, we aimed to estimate the disease burden and risk factors in every state of India as part of the Global Burden of Disease (GBD) Study 2016
    corecore